21 research outputs found

    SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

    Full text link
    Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of potential tasks in remote sensing images is massive, and the sizes of features range from several kilometers to just tens of centimeters. However, creating generalizable computer vision methods is a challenge in part due to the lack of a large-scale dataset that captures these diverse features for many tasks. In this paper, we present SatlasPretrain, a remote sensing dataset that is large in both breadth and scale, combining Sentinel-2 and NAIP images with 302M labels under 137 categories and seven label types. We evaluate eight baselines and a proposed method on SatlasPretrain, and find that there is substantial room for improvement in addressing research challenges specific to remote sensing, including processing image time series that consist of images from very different types of sensors, and taking advantage of long-range spatial context. Moreover, we find that pre-training on SatlasPretrain substantially improves performance on downstream tasks, increasing average accuracy by 18% over ImageNet and 6% over the next best baseline. The dataset, pre-trained model weights, and code are available at https://satlas-pretrain.allen.ai/.Comment: ICCV 202

    Machine-Assisted Map Editing

    Full text link
    Mapping road networks today is labor-intensive. As a result, road maps have poor coverage outside urban centers in many countries. Systems to automatically infer road network graphs from aerial imagery and GPS trajectories have been proposed to improve coverage of road maps. However, because of high error rates, these systems have not been adopted by mapping communities. We propose machine-assisted map editing, where automatic map inference is integrated into existing, human-centric map editing workflows. To realize this, we build Machine-Assisted iD (MAiD), where we extend the web-based OpenStreetMap editor, iD, with machine-assistance functionality. We complement MAiD with a novel approach for inferring road topology from aerial imagery that combines the speed of prior segmentation approaches with the accuracy of prior iterative graph construction methods. We design MAiD to tackle the addition of major, arterial roads in regions where existing maps have poor coverage, and the incremental improvement of coverage in regions where major roads are already mapped. We conduct two user studies and find that, when participants are given a fixed time to map roads, they are able to add as much as 3.5x more roads with MAiD

    RoadTagger: Robust Road Attribute Inference with Graph Neural Networks

    Full text link
    Inferring road attributes such as lane count and road type from satellite imagery is challenging. Often, due to the occlusion in satellite imagery and the spatial correlation of road attributes, a road attribute at one position on a road may only be apparent when considering far-away segments of the road. Thus, to robustly infer road attributes, the model must integrate scattered information and capture the spatial correlation of features along roads. Existing solutions that rely on image classifiers fail to capture this correlation, resulting in poor accuracy. We find this failure is caused by a fundamental limitation -- the limited effective receptive field of image classifiers. To overcome this limitation, we propose RoadTagger, an end-to-end architecture which combines both Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) to infer road attributes. The usage of graph neural networks allows information propagation on the road network graph and eliminates the receptive field limitation of image classifiers. We evaluate RoadTagger on both a large real-world dataset covering 688 km^2 area in 20 U.S. cities and a synthesized micro-dataset. In the evaluation, RoadTagger improves inference accuracy over the CNN image classifier based approaches. RoadTagger also demonstrates strong robustness against different disruptions in the satellite imagery and the ability to learn complicated inductive rules for aggregating scattered information along the road network

    Robust road topology extraction from aerial imagery

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 63-65).Creating and updating road maps is currently an expensive and often manual process, and thus maps today are outdated or have poor coverage in large regions of the world. Automatically inferring the road network graph from aerial imagery provides a promising avenue to reducing the cost of maintaining road maps, but existing inference methods have poor precision. This thesis develops a novel iterative graph construction process for extracting graph structures from images, and applies this process to automatic road topology inference to significantly reduce error rates.by Favyen Bastani.S.M

    Label-Efficient and Compute-Efficient Video Analytics

    No full text
    The ability to analyze large-scale video datasets is useful in an increasing range of applications. For example, a traffic planner may want to analyze traffic camera video to compare the frequency of hard braking at different junctions, while an ecology researcher may be interested in identifying instances of various behaviors between pairs of birds in video of a bird feeder. However, implementing machine learning (ML) pipelines for video analytics tasks remains challenging for two reasons. First, these tasks generally require applying expensive ML models to robustly detect and track objects such as cars and birds. These models are both label-intensive, often requiring thousands of labeled examples to achieve high-accuracy, and compute-intensive, executing at tens of frames per second even on datacenter GPUs. Second, in addition to applying ML models, these tasks often require several auxiliary operations to pre-process the input video and associated metadata, and to post-process model outputs to extract useful insights. For example, counting hard braking incidents necessitates post-processing object tracks of cars to identify sharp decelerations. In this thesis, we present SkyhookML, a platform for analytics tasks over large-scale video datasets. To reduce the cost of video analytics, we integrate approximate video query processing optimizations, efficient video pre-processing methods, and self-supervised learning techniques into SkyhookML. Approximate processing optimizations sacrifice a small amount of accuracy for large gains in throughput by avoiding applying the most accurate but also most expensive models on every video frame. Efficient pre-processing methods extract general-purpose insights from video that can be reused across several analytics tasks. Self-supervised learning techniques can substantially reduce the labeling effort needed to train robust models by deriving learning signals from unlabeled data. By employing novel approaches in each of these three categories that are specialized for analyzing object detections and tracks that appear in video data, SkyhookML addresses the label- and compute-intensiveness of video analytics and enables users to efficiently develop and deploy ML pipelines.Ph.D
    corecore